-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Job objects to postgres #368
Conversation
src/app.py
Outdated
@@ -492,11 +492,11 @@ def matches( | |||
|
|||
@app.get( | |||
"/api/job/{job_id}", | |||
response_model=JobSchema, | |||
response_model=Job, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're fine with returning the whole row from db, right? I.e. there's no sensitive stuff there that users shouldn't have access to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Because if that was the case, we could define a schema for reading the job object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for postgres migration I had to define internal_id
for the Job object. It's not really sensitive, but by design it's not necessary for anything. I'll think about a Read object
@@ -67,55 +68,36 @@ def __schedule(self, agent: str, task: Any, *args: Any) -> None: | |||
|
|||
def get_job_ids(self) -> List[JobId]: | |||
"""Gets IDs of all jobs in the database""" | |||
return [key[4:] for key in self.redis.keys("job:*")] | |||
with Session(self.engine) as session: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're probably gonna be writing this a lot, should we maybe create a helper decorator like with self.get_session()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it'll be useful to even have something like
self.execute(update(Job).where(...))
to handle commits etc automaticaly (this is a very common pattern). I'll do a separate refactor PR in a while
@@ -67,55 +68,36 @@ def __schedule(self, agent: str, task: Any, *args: Any) -> None: | |||
|
|||
def get_job_ids(self) -> List[JobId]: | |||
"""Gets IDs of all jobs in the database""" | |||
return [key[4:] for key in self.redis.keys("job:*")] | |||
with Session(self.engine) as session: | |||
jobs = session.exec(select(Job)).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could optimize this by selecting just the required row. But if there's not that many jobs and no automatic table joins it's probably not worth the effort at this moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just the required collumn? Yeah, I've thought about this but when I wrote it I only wanted to have it working and go back to it later.
It's not worth refactoring IMO, because this function is a wrong abstraction. It's used in some places, for example like this:
for job in db.get_job_ids():
job.get_job()
I didn't change it right away to avoid changes outside of the db.py
but the database abstraction need to be changed significantly later.
{"status": "cancelled", "finished": int(time())}, | ||
) | ||
def cancel_job(self, job: JobId, error=None) -> None: | ||
"""Sets the job status to cancelled, with optional error message""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the error message though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice (refactoring gone wrong)
|
||
def get_job(self, job: JobId) -> JobSchema: | ||
def get_job(self, job: JobId) -> Job: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the correct typing should be Optional[Job]
most likely?
And then we'd want to use one_or_none()
.
Unless sqlalchemy.orm.exc.NoResultFound
is handled somewhere down the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not handled, but this should raise an exception if the job with a specified ID doesn't exist. Does it make sense? 🤔 It's used mostly in context where the UID is known to be good, or when it's user input and if it's bad it should fail
self.redis.hmset(f"job:{job}", {"status": "removed"}) | ||
with Session(self.engine) as session: | ||
session.execute( | ||
update(Job).where(Job.id == job).values(status="removed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some kind of enum/consts for the query statuses at some point probably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have it in the TODO, saving it for a refactor 🙏 (I'll need to check how to do this in sqlalchemy but I'll check other projects I guess)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue out of it: #370
"taints": json.dumps(taints), | ||
} | ||
with Session(self.engine) as session: | ||
obj = Job( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could theoretically provide the defaults in the model declaration, but since this is the only initialization at this moment it's probably not worth it at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question 🤔 depends on which one is clearer. Having defaults in the model for things like files_in_progress makes sense. When have a proper database I consider splitting this into several smaller objects (to avoid one huge job object that counts everything).
Co-authored-by: Michał Praszmo <[email protected]>
f5b724b
to
eae7f24
Compare
Your checklist for this pull request
What is the current behaviour?
Job objects are kept in redis
What is the new behaviour?
Job objects are kept in postgres.
I'm consciously trying to keep everything very self-contained and don't want to change a single line outside of the Database abstraction. It's very useful to have this layer when turning around the whole storage layer.
Related to #74